Data exploration is not only about creating numbers and summary statistics. Sometimes a nice plot reveals more interesting intsights into data. In this exercise, we exploit what we’ve just learned about plots in R and in particular in ggplot2. Now we’re going to use all of the gapminder GDP data!

1

Load the gapminder GDP data in the long format as in the Summary Statistics exercise. Make sure not to exclude the time period between 1970 and 2001.
Remember that we applied the filter()-function for choosing the individual time periods.

Previously, we only have analyzed how the time period of 1960-1969 compares to the period of 2002-2011. The nice thing about plots is that we can make use of the whole range of years and still identify differences between various periods. Our plot of choice therefore is a line plot to create a nice time series.

2

Plot the gapminder data as a line plot to get a time series.
Instead of geom_point as in the slides the geom’s name is geom_line. Moreover, in the aesthetics defintion aes() you may want to define a grouping variable group = 1; otherwise ggplot thinks you want to plot one line for each year.

Admittedly, this may not be the best approach to identify differences between the time periods directly. We don’t know when our time periods start and when they end. Luckily, this can be fixed using at least two approaches. Let’s start with the first one: using colors for different time periods. For this purpose, we need an indicator variable as a grouping variable that applies different colors to the line at each time period.

3

Create an indicator variable for the time periods 1960-1969, 2002-2011 and the time inbetween.
A combination between mutate() and the if_else lets you create new variables rather easily. Moreover, to get some sensible legend labels later define them as strings.

After we’re set up with our indicator variable, it’s plotting time again. We can simply re-use our code from before and define a gouping color in the aestetics defintion. Try it out!

4

Plot the line plot once again with different colors for the different time periods.
In the aesthetics defintion aes(), you can choose the option color = indicator_variable to define the grouping.

Now we can see some visual differences between the different time periods. One last thing, however, is that there are way too many labels on the x-axis. Maybe a more sensible labeling approach would be to create axis breaks for every ten years steps.

5

Create some prettier, i.e., more sensible breaks for the x-axis.
You can modify the x-axis with scale_x_discrete() and its breaks with the option breaks = breaks_vector.